perm filename A02.TEX[262,RWF] blob sn#870982 filedate 1989-03-08 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\input macro[1,mps]
C00011 ENDMK
C⊗;
\input macro[1,mps]
\magnification\magstephalf
\baselineskip 14pt
\leftline{\sevenrm A02.Tex[262,rwf]\today}
\leftline{\copyright\sevenrm Robert W. Floyd, March 8, 1989}
\leftline{\sevenrm Unpublished draft}
\bigskip
\noindent CS 262 Midterm Exam Solutions
\medskip
\noindent Problem 2 Solution

Each byte has distribution $P = \langle p↓1,\dots, p↓m\rangle$.  Let
$K = \langle k↓1, k↓2, \dots, k↓m\rangle$, a shorthand.  Let
$B$ be the distribution defined by  $b↓K = {n\choose K} 
p↓1↑{k↓1}\dots p↓m↑{k↓m}$, the probability that the $j↑{th}$ letter of 
the alphabet is used $k↓j$ times, i.e. that the $j↑{th}$ subset of the 
partition is of size $k↓j$.  The expected reduction in entropy is 
$$\displaylines{\quad\sum↓{\mid K\mid = n} b↓k (\log (n!) - \sum↓j\log
(k↓j!))\hfill\cr
\qquad = \sum↓{\mid K\mid = n} b↓K (\log b↓K - \sum↓j k↓j\log p↓j)\hfill\cr
\qquad = - H [B] - \sum↓j E[k↓j]\log p↓j\hfill\cr
\qquad = - H[B] - n\sum↓j p↓j\log p↓j = nH[P] - H[B].\hfill\cr}$$
Since there are only $n+m-1\choose m-1$ possible values of $K$, 
$H[B] < \lg {n+m-1\choose m-1}\leq (m-1) \lg (n+1)$, and the asymptotic entropy
reduction is $n H[P] - O (\lg n)$.
\medskip
\noindent Problem 3 Solution

$$\eqalign{f(2n) &= 2f(n) + 2n-1\cr
L(f) &= f(2n) - 2f(n) = 2n-1 = g\cr}$$
$$\halign{#&#\hfil\cr
For & $f=1$, $g= -1$\cr
& $f=n$, $g=0$ \hbox{(useful to meet boundary conditions)}\cr
& $f=2↑n$, $g=4↑n - 2↑{n-1}$ \hbox {is not helpful}\cr
& $f= \lg n$, $g= 1-\lg n$\cr
& $f=n \lg n$, $g = 2n$\cr}$$

Combining the first and last,
$$f = n\lg n + 1,\, g = 2n-1$$
To fit the condition $f(1) = 0$, add $f=n$, $g=0$, to get
$$ f= n(\lg n-1) + 1, g=2n-1.$$
Summation factor method:
$$\eqalign {f(2n)\over 2n & = {2f(n)\over 2n} + 1 - {1\over 2n}\cr
h(2n) & = h(n) + 1 - {1\over 2n} \qquad n=2↑i\qquad \phi(i) = h(2↑i)\cr
\phi (i+1) & = \phi (i) + 1 - {1\over 2↑{i+1}}\cr
f(1) = 0; h(1) & = 0;\, \phi (0) = 0\cr
\phi(i) &= i - \left ({1\over 2}+ {1\over 4}+\dots + {1\over 2↑i}\right ) = 
i-1 + {1\over 2↑i}\cr
h(n) &=\lg n - 1 + {1\over n}\cr
f(n) & = n\lg n - n+1.\cr}$$
\noindent Additive method: for each $1\leq 2↑i < n = 2↑a,$ merge $2↑i$ with
$2↑i$ to form $2↑{i+1}$, $2↑{a-i-1}$ times, at cost $2↑{i+1} -1$.

For each such $i$, total cost is $(2↑{a-i-1})(2↑{i+1}-1) = 2↑a - 2↑{a-i-1}
= 2↑a(1-2↑{-i-1}).$

Sum over $0 \leq i < a$:
$$\displaylines{\quad a\cdot 2↑a - 2↑a(2↑{-1} + \dots + 2↑{-a}) =\hfill\cr
\quad a\cdot 2↑a - 2↑a(1-2↑{-a}) = a\cdot 2↑a - 2↑a + 1\hfill\cr
\quad = n(\lg n-1) + 1.\hfill\cr}$$
\bigskip
\noindent Problem Number 4:

Try $f(i)= {i\choose p}$ for each non-negative integer p. Using the identity
$${n \choose i} {n\choose p} = {n \choose p} {{n-p} \choose {i-p}}$$
\bigskip
$$\eqalign {g_n &=L(f↓n) = f_n - 2 \sum_i {n \choose i} {f↓i \over 2↑n} \cr
	       &= {n \choose p} (1- 2↑{1-p})} $$
For $p=0$, $f_n=g_n=-1$ and for $p=1, f_n=n, g_n=0$. Linear combinations of the
above give all polynomials $g_n$ of the form
 constant $+$ multiple of $n(n-1)$
i.e., such that $g_0 = g_1$.

There is no polynomial $f$ for which $g_n = n$ because such an $f$ must be 
$O(n \log n)$. Therefore, the above family is the largest possible.

\noindent (The results above cannot be used to find the expected number of 
calls in radix sort, because the recurrence is wrong for a correctly coded 
radix sort).
\vfill\eject
\noindent Problem number 5:

The ``sample'' size is $S={(n-1)\over 2}= 2↑{p-1} - 1$. The rest of the data
set has cardinality $t=2↑{p-1}$. The cost (measured in comparisons) of partitioning 
is
$p-1$ per datum, for a total cost of $(p-1)t$.

Consider any two data $x$ and $y$ not in the sample, where $x$ precedes $y$.
They are inverted after partitioning if, when inserted among the sample, $x$
ranks just after $y$. There are ${{(s+2)}↑{\underline 2} }$ ranks they could have, and $s+1$
of these are inversions for probability ${1\over {s+2}} = {1\over {t+1}}$ 
inversion.

The number of pairs $x$ and $y$, $x$ preceding $y$, is ${t \choose 2}$. The
expected number of inversions is 
$${{t(t-1)} \over {2 (t+1)}} = {t\over 2} -1+{1 \over {t+1}}$$
Assume an insertion sort algorithm that, on $k$ data with $i$ inversions, uses
$k-1+i$ comparisons. Since the cardinalities of the subproblems after 
partitioning add to $t$ and their number is also $t$, the expected number
of comparisons to finish up is
$$\sum (k-i+1) = t-t +t/2-1+ {1\over {t+1}}.$$
The total top level cost is then $t(p+1)+t/2-1+{1 \over {t+1}}$. The recursions
include sorting $2↑i -1$ data, for $2 \le i \le p$, at cost
$$2↑{i-1} (i+1) + 2↑{i-1} -1 + {1 \over {2↑{i-1} + 1}}.$$
The total cost is
$$\eqalign {&\sum↓{2\le i \le p} i 2↑{i-1}  + 2↑i -1 + {1 \over 
{2↑{i-1} + 1}}\cr
	=&(p+1)n + O(1)}$$
\bigskip
\noindent Problem Number 6:

The coefficient of $f_n$ on the left is zero if $n<s$. On the right, the 
coefficient is zero if $n<a$. Therefore, any function $f_n$ that is zero for
$n\ge a$ is a solution. The delta functions, $f_n= \delta _{kn}$ for each
$0\le K\le a-1$ are one instance. So are $K \choose n$. My earliest solution 
to this one was more complicated. I hope you all found an easy one.
\bigskip
\noindent Thanks to Heping for finding the false assumption in Problem 1.
\bye